Learning Actions Models: Qualitative Approach

نویسندگان

  • Thomas Bolander
  • Nina Gierasimczuk
چکیده

In dynamic epistemic logic, actions are described using action models. In this paper we introduce a framework for studying learnability of action models from observations. We present first results concerning propositional action models. First we check two basic learnability criteria: finite identifiability (conclusively inferring the appropriate action model in finite time) and identifiability in the limit (inconclusive convergence to the right action model). We show that deterministic actions are finitely identifiable, while non-deterministic actions require more learning power—they are identifiable in the limit. We then move on to a particular learning method, which proceeds via restriction of a space of events within a learning-specific action model. This way of learning closely resembles the well-known update method from dynamic epistemic logic. We introduce several different learning methods suited for finite identifiability of particular types of deterministic actions. Dynamic epistemic logic (DEL) allows analyzing knowledge change in a systematic way. The static component of a situation is represented by an epistemic model, while the structure of the dynamic component is encoded in an action model. An action model can be applied to the epistemic model via so-called product update operation, resulting in a new up-to-date epistemic model of the situation after the action has been executed. A language, interpreted on epistemic models, allows expressing conditions under which an action takes effect (so-called preconditions), and the effects of such actions (so-called postconditions). This setting is particularly useful for modeling the process of epistemic planning (see [7,1]): one can ask which sequence of actions should be executed in order for a given epistemic formula to hold in the epistemic model after the actions are executed. The purpose of this paper is to investigate possible learning mechanisms involved in discovering the ‘internal structure’ of actions on the basis of their executions. In other words, we are concerned with qualitative learning of action models on the basis of observations of pairs of the form (initial state, resulting state). We analyze learnability of action models in the context of two learning conditions: finite identifiability (conclusively inferring the appropriate action model in finite time) and identifiability in the limit (inconclusive convergence to the right action model). The paper draws on the results from formal learning theory applied to DEL (see [11,13,12]). Learning of action models is highly relevant in the context of epistemic planning. A planning agent might not initially know the effects of her actions, so she will initially not be able to plan to achieve any goals. However, if she can learn the relevant action models through observing the effect of the actions (either 2 Thomas Bolander and Nina Gierasimczuk by executing the actions herself, or by observing other agents), she will eventually learn how to plan. Our ultimate goal is to integrate learning of actions into (epistemic) planning agents. In this paper, we seek to lay the foundations for this goal by studying learnability of action models from streams of observations. The structure of the paper is as follows. In Section 1 we recall the basic concepts and notation concerning action models and action types in DEL. In Section 2 we specify our learning framework and provide general learnability results. In Section 3 we study particular learning functions, which proceed via updating action models with new information. Finally, in Section 4 we indicate how to lift our results from the level of individual action learning to that of action library learning. In the end we briefly discuss related and further work. 1 Languages and action types Let us first present the basic notions required for the rest of the article (see [6,8] for more details). Following the conventions of automated planning, we take the set of atomic propositions and the set of actions to be finite. Given a finite set P of atomic propositions, we define the (single-agent) epistemic language over P , Lepis(P ), by the following BNF: φ ::= p | ¬φ | φ ∧ φ | Kφ, where p ∈ P . The language Lprop(P ) is the propositional sublanguage without the Kφ clause. When P is clear from the context, we write Lepis and Lprop instead of Lepis(P ) and Lprop(P ), respectively. By means of the standard abbreviations we introduce the additional symbols →, ∨, ↔, ⊥, and ⊤. Definition 1 (Epistemic models and states). An epistemic model over a set of atomic propositions P is M = (W,R, V ), where W is a finite set of worlds, R ⊆ W ×W is an equivalence relation, called the indistinguishability relation, and V : P → P(W ) is a valuation function. An epistemic state is a pointed epistemic model (M, w) consisting of an epistemic model M = (W,R, V ) and a distinguished world w ∈ W called the actual world. A propositional state (or simply state) over P is a subset of P (or, equivalently, a propositional valuation ν : P → {0, 1}). We identify propositional states and singleton epistemic models via the following canonical isomorphism. A propositional state s ⊆ P is isomorphic to the epistemic model M = ({w}, {(w,w)}, V ) where V (p) = {w} if p ∈ s and V (p) = ∅ otherwise. Truth in epistemic states (M, w) with M = (W,R, V ) (and hence propositional states) is defined as usual and hence omitted. Dynamic epistemic logic (DEL) introduces the concept of an action model for modelling the changes to states brought about by the execution of actions [6]. We here use a variant that includes postconditions [8,7], which means that actions can have both epistemic effects (changing the beliefs of agents) and ontic effects (changing the factual states of affairs). Definition 2 (Action models). An action model over a set of atomic propositions P is A = (E,Q, pre, post), where E is a finite set of events; Q ⊆ E ×E Learning Actions Models: Qualitative Approach 3 is an equivalence relation called the indistinguishability relation; pre : E → Lepis(P ) assigns to each event a precondition; post : E → Lprop(P ) assigns to each event a postcondition. Postconditions are conjunctions of literals (atomic propositions and their negations) or ⊤. dom(A) = E denotes the domain of A. The set of all action models over P is denoted Actions(P ). Intuitively, events correspond to the ways in which an action changes the epistemic state, and the indistinguishability relation codes (an agent’s) ability to recognize the difference between those different ways. In an event e, pre(e) specifies what conditions have to be satisfied for it to take effect, and post(e) specifies its outcome. Example 1. Consider the action of tossing a coin. It can be represented by the following action model (h means that the coin is facing heads up): A = e1 : 〈⊤, h〉 e2 : 〈⊤,¬h〉 We label each event by a pair whose first argument is the event’s precondition while the second is its postcondition. Hence, formally we have A = (E,Q, pre, post) with E = {e1, e2}, Q is the identity on E, pre(e1) = pre(e2) = ⊤, post(e1) = h and post(e2) = ¬h. The action model encodes that tossing the coin will either make h true (e1) or h false (e2). Definition 3 (Product update). Let M = (W,R, V ) and A = (E,Q, pre, post) be an epistemic model and action model (over a set of atomic propositions P ), respectively. The product update of M with A is the epistemic model M⊗A = (W , R, V ), where W ′ = {(w, e) ∈ W × E | (M, w) |= pre(e)}; R = {((w, e), (v, f)) ∈ W ′ × W ′ | wRv and eQf}; V (p) = {(w, e) ∈ W ′ | post(e) |= p or ((M, w) |= p and post(e) 6|= ¬p)}. For e ∈ dom(A), we define M⊗ e = M⊗ (A ↾ {e}). The product update M⊗A represents the result of executing the action A in the state(s) represented by M. Example 2. Continuing Example 1, consider a situation of an agent seeing a coin lying heads-up, i.e., the singleton epistemic state M = ({w}, {w,w}, V ) with V (h) = {w}. Let us now calculate the result of executing the coin toss in this model. M⊗A = (w1, e1) : h (w1, e2) : Here each world is labelled by the propositions being true at the world. We say that two action models A1 and A2 are equivalent, written A1 ≡ A2, if for any epistemic model M, M⊗A1↔M⊗A2, where ↔ denotes standard bisimulation on epistemic models [17]. 1 We are here using the postcondition conventions from [7], which are slightly nonstandard. Any action model with standard postconditions can be turned into one of our type, but it might become exponentially larger in the process [8,7]. 4 Thomas Bolander and Nina Gierasimczuk

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Incremental learning of action models as HMMs over qualitative trajectory representations

In this paper we present an incremental approach to learning generative models of object manipulation actions as HMMs over qualitative relations between two objects. We compare the incremental approach against a traditional batch training baseline and show that the resulting qualitative action models are capable of one-shot learning after just one seen example while displaying good generalizati...

متن کامل

Soccer Goalkeeper Task Modeling and Analysis by Petri Nets

In a robotic soccer team, goalkeeper is an important challenging role, which has different characteristics from the other teammates. This paper proposes a new learning-based behavior model for a soccer goalkeeper robot by using Petri nets. The model focuses on modeling and analyzing, both qualitatively and quantitatively, for the goalkeeper role so that we have a model-based knowledge of the ta...

متن کامل

Preference-Based Policy Iteration: Leveraging Preference Learning for Reinforcement Learning

This paper makes a first step toward the integration of two subfields of machine learning, namely preference learning and reinforcement learning (RL). An important motivation for a “preference-based” approach to reinforcement learning is a possible extension of the type of feedback an agent may learn from. In particular, while conventional RL methods are essentially confined to deal with numeri...

متن کامل

Leveraging Qualitative Reasoning to Learning Manipulation Tasks

Learning and planning are powerful AI methods that exhibit complementary strengths. While planning allows goal-directed actions to be computed when a reliable forward model is known, learning allows such models to be obtained autonomously. In this paper we describe how both methods can be combined using an expressive qualitative knowledge representation. We argue that the crucial step in this i...

متن کامل

Investigating the Causes of Divorce through Narrative Analysis in Yazd City and Designing a Prerequisite Education based on the Causes of Divorce using a Hidden Learning Approach on the basis of Family, School, and Student

Introduction: Today, divorce is a well-known and dangerous social phenomenon that disintegrates families and corrupts the society. Therefore, this study aimed to investigate the causes of divorce through narrative analysis in Yazd City and to design a prerequisite education based on the causes of divorce using a hidden learning approach on the basis of family, school, and student approach. Met...

متن کامل

Investigating and Analysing Instructional Design and Workplace Learning Models and Selection of Adaptive Model to Optimize Organizational Training in Petrochemical Industry

The present research aimed to analyze instructional design,workplace learning, and selecting the optimum model of learning for human resources training in petrochemical industry.The previous roles have become faint and new opportunities have appeared in petrochemical industry by starting the process of privatization and changing the nature of the company from holding to a governance and develop...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015